A Cross-Lingual Spoken Content Search System
نویسندگان
چکیده
This paper presents an approach towards enabling audio search for those languages where training an automatic speech recognition (ASR) system is difficult, owing to lack of training resources. Our work is related to previous approaches where the problem of allowing search for out-of-vocabulary terms has been addressed. A phonetic recognizer is used to convert the audio data into phonetic lattices. In the proposed approach, the acoustic models (AM) for the phonetic recognizer are trained on a base language for which training data is available and used to search the content in a similar language. A phonetic language model (PLM) is trained for each language independently using text data available from a variety of sources including the web. We have performed experiments to evaluate this approach for searching through Gujarati corpus where the AM were trained on Indian-English corpus. The experimental results show that this approach can provide a P@10 (precision at 10) accuracy of up to 0.65.
منابع مشابه
An Engine for Online Video Search in Large Archives of the Holocaust Testimonies
In this paper we present an online system for cross-lingual lexical (full-text) searching in the large archive of the Holocaust testimonies. Video interviews recorded in two languages (English and Czech) were automatically transcribed and indexed in order to provide efficient access to the lexical content of the recordings. The engine takes advantage of the state-of-the-art speech recognition s...
متن کاملExplorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis
In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...
متن کاملModern Multilingual and Cross-lingual Information Access Technologies
In this chapter, we describe the state of the art cross-lingual and multilingual strategies and their related areas. In particular, we show a WWW-based information system called MIETTA, which allows uniform and multilingual access to heterogeneous data sources in the tourism domain. The design of the search engine is based on a new cross-lingual framework. The framework integrates a cross-lingu...
متن کاملTIPS: A Translingual Information Processing System
Searching online information is increasingly a daily activity for many people. The multilinguality of online content is also increasing (e.g. the proportion of English web users, which has been decreasing as a fraction the increasing population of web users, dipped below 50% in the summer of 2001). To improve the ability of an English speaker to search mutlilingual content, we built a system th...
متن کاملIdentifying Effective Translations for Cross-lingual Arabic-to-English User-generated Speech Search
Cross Language Information Retrieval (CLIR) systems are a valuable tool to enable speakers of one language to search for content of interest expressed in a different language. A group for whom this is of particular interest is bilingual Arabic speakers who wish to search for English language content using information needs expressed in Arabic queries. A key challenge in CLIR is crossing the lan...
متن کامل